-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix --skip matching #2058
base: main
Are you sure you want to change the base?
Fix --skip matching #2058
Conversation
d4ead50
to
35f5951
Compare
I believe one of the tests cannot pass until the patch is merged. |
Are you referring to the failures here: Yes I'd agree, it's running against master, but you've obviously updated the config to work post merge. |
Yes, that's the test. This patch is obviously breaking backwards compatibility, as demonstrated by the test. Would it be worth keeping both behaviours for some time, and deprecate the old behaviour in a future release? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @DimitriPapadopoulos .
Please can we have some tests to go with it, both to confirm the before behaviour, prove you've fixed the bug (and ensure we don't break it again in future) and maybe to validate the behaviour of some of my more theoretical issue comments? I don't think they're necessarily deal breakers but it would be good to at least check how we currently behave.
Given a file foo in:
bar/foo
And a directory:
doc/foo/baz
or
foo/baz
Can I skip one without it also skipping the other?
codespell_lib/_codespell.py
Outdated
# Pattern might be a list of comma-delimited strings | ||
self.pattern_list = ','.join(pattern).split(',') | ||
# Pattern might be a list of comma-separated patterns, | ||
# we remove any trailing path separator from each pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a brief reason for why? Is it just consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 reasons::
- The trailing separator can be copied/pasted from the command line, at least in some Linux setups. It did happen to me, perhaps using
<TAB>
for completion under Bash:$ ls foo/bar/
- The trailing separator appears in issues Behavior of --skip with folder names #1915 and --skip doesn't work #2047, adding to the confusion when trying to understand what's going wrong.
It's clear to me that removing the trailing space does what end-users would expect. At least when dealing with directories. The only caveat could be -S foo/bar/
, some end-users could interpret it as match foo/bar
only if it is a directory. But this is far-fetched. The current behaviour is worse, -S doc/foo/baz/
simply cannot match anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove this specific commit for now, and then attempt to match directories only without the separator. That's more complicated, though, and better suited to a new PR later on. But then why would a user prepend a separator to a file name? That's really an obscure corner case that shouldn't ever happen. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend accepting this commit for now. Then I can work on limiting the possibility of removing a trailing separator for directories only in an upcoming pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, I wasn't necessarily meaning you have to change it, just that the comment doesn't explain why for someone revisiting it. You've currently written "We paint the room red", I'm suggesting that "We paint the room red, to annoy the bull" will be more useful in the future. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I have added something like "annoy the bull".
codespell_lib/_codespell.py
Outdated
@@ -95,8 +95,10 @@ class QuietLevels(object): | |||
class GlobMatch(object): | |||
def __init__(self, pattern): | |||
if pattern: | |||
# Pattern might be a list of comma-delimited strings | |||
self.pattern_list = ','.join(pattern).split(',') | |||
# Pattern might be a list of comma-separated patterns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what would happen here if you had a file you wanted to skip called foo,bar
but that's probably one for another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree about the use case of a file name including a comma. However, this pull request doesn't alter the behaviour of the -S foo,bar
option, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry I'm basically reviewing the code when I'm reviewing your changes, given it's unlikely a TODO in here saying this will break with files with a comma in is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the pattern separator is a comma, I think it's obvious we cannot handle patterns containing a comma. Perhaps we should make that clear in the documentation:
Because the comma is a pattern separator, patterns cannot include commas. In the unlikely event you need to match a path containing a comma, please consider working around this limitation using globs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could introduce an escape mechanism, perhaps \,
. But I'd wait for actual users to ask for it...
codespell_lib/_codespell.py
Outdated
for root, dirs, files in os.walk(filename): | ||
if glob_match.match(root): # skip (absolute) directories | ||
rel_root = os.path.relpath(root, filename) | ||
if glob_match.match(rel_root): # skip relative directory path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, given the file structure:
foo/bar/foo/baz
I think os.walk will descend and therefore test foo/, foo/bar/ and foo/bar/foo (if I'm reading it right). I think this means your relative check could match both foos, although I'm not sure this is necessarily an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which exact pattern do you have in mind?
-S foo
matches used to match and still matchesfoo/
andfoo/bar/foo/
- or so I think and will double-check.-S foo/bar/foo
will match onlyfoo/bar/foo/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't be fooled by the absolute → relative change in the comment. The comments were using absolute and relative incorrectly, I have just fixed the comments, not necessarily changed the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've added the relpath function too, does this really have no impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relpath()
function helps match the relative path foo/bar
in addition to the "absolute" paths ./foo/bar
or /other/path/to/top/level/foo/bar
. I don't really see an issue here, but I may be missing something of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope the tests I have added address your concerns. If not, happy to modify them or add more tests.
Yes! I hadn't really thought about it, but this will break all the places where I've configured skip in exactly the same way... As per my initial issue ( #1915 (comment) ), is there some reason we can't just support all variants so people don't have to remember a particular version? |
I see the previous variant as broken. In the case of a
That said, we could support all variants. It's just that the old variant seems totally broken to me, I spent half a day trying to make sense out of it. |
I totally agree about adding tests -also to double-check my above assertions! |
The test directory looks like this: $ mkdir codespell_test
$
$ mkdir codespell_test/foo
$ echo "errror" > codespell_test/foo/bar
$
$ mkdir -p codespell_test/doc/foo/baz
$ echo "errror" > codespell_test/doc/foo/baz/file
$
$ tree codespell_test
codespell_test
├── doc
│ └── foo
│ └── baz
│ └── file
└── foo
└── bar
4 directories, 2 files
$ Here are some tests with the old behaviour, and as you can see, $ codespell -S foo/bar,doc/foo/baz # FAIL
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell -S ./foo/bar,./doc/foo/baz # PASS
$
$ codespell -S ./foo/bar/,./doc/foo/baz/ # FAIL
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell -S foo # PASS
$
$ codespell $PWD -S foo # PASS
$
$ codespell "$PWD" -S "$PWD/foo/bar,$PWD/doc/foo/baz" # PASS
$
$ codespell "$PWD" -S foo/bar,doc/foo/baz # FAIL
/tmp/codespell_test/doc/foo/baz/file:1: errror ==> error
/tmp/codespell_test/foo/bar:1: errror ==> error
$
$ codespell -S "$PWD/foo/bar,$PWD/doc/foo/baz" # FAIL
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell "$PWD" -S ./foo/bar,./doc/foo/baz # FAIL
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$ |
Here are some tests with the new behaviour: $ codespell -S foo/bar,doc/foo/baz # PASS
$
$ codespell -S ./foo/bar,./doc/foo/baz # FAIL (but I plan on restoring the old variant!)
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell -S foo/bar/,doc/foo/baz/ # PASS (I agree that may feel strange because foo/bar is a file, not a directory)
$
$ codespell -S foo # PASS
$
$ codespell $PWD -S foo # PASS
$
$ codespell "$PWD" -S "$PWD/foo/bar,$PWD/doc/foo/baz" # FAIl (but I plan on restoring the old variant!)
/tmp/codespell_test/doc/foo/baz/file:1: errror ==> error
/tmp/codespell_test/foo/bar:1: errror ==> error
$
$ codespell -S "$PWD/foo/bar,$PWD/doc/foo/baz" # FAIL (as in the old variant)
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell "$PWD" -S ./foo/bar,./doc/foo/baz # FAIL (as in the old variant)
/tmp/codespell_test/doc/foo/baz/file:1: errror ==> error
/tmp/codespell_test/foo/bar:1: errror ==> error
$
$ codespell "$PWD" -S foo/bar,doc/foo/baz # PASS
$ |
35f5951
to
e75e4a6
Compare
And now after the last commit, combining both behaviours: $ codespell -S foo/bar,doc/foo/baz # PASS
$
$ codespell -S ./foo/bar,./doc/foo/baz # PASS
$
$ codespell -S foo/bar/,doc/foo/baz/ # PASS (I agree that may feel strange because foo/bar is a file, not a directory)
$
$ codespell -S foo # PASS
$
$ codespell $PWD -S foo # PASS
$
$ codespell "$PWD" -S "$PWD/foo/bar,$PWD/doc/foo/baz" # PASS
/tmp/codespell_test/doc/foo/baz/file:1: errror ==> error
/tmp/codespell_test/foo/bar:1: errror ==> error
$
$ codespell -S "$PWD/foo/bar,$PWD/doc/foo/baz" # FAIL (as in the old variant)
./doc/foo/baz/file:1: errror ==> error
./foo/bar:1: errror ==> error
$
$ codespell "$PWD" -S ./foo/bar,./doc/foo/baz # FAIL (as in the old variant)
/tmp/codespell_test/doc/foo/baz/file:1: errror ==> error
/tmp/codespell_test/foo/bar:1: errror ==> error
$
$ codespell "$PWD" -S foo/bar,doc/foo/baz # PASS
$ |
I would still recommend deprecating the old behaviour in the future, as it is counterintuitive and confusing. Anyway, for now we have both. |
I'm not disputing that, but equally I don't want to have to do a big bang fix on the many repos I've pushed spellchecking on which match that pattern.
Perfect thanks. I guess we should introduce a message at some point (now?) telling people they should migrate their config? |
Great thanks. Did you also test with a file named foo? Sorry, I meant a test for this behaviour: Not simply testing it works with the code. I.e. something that will automatically run and confirm if it changes in future. See what we've already got here (and possibly other tests): |
I think the message to end-users can wait. Let users get acquainted with the new behaviour first. However, I should probably guard the old behaviour with something like this: #ifndef CODESPELL_DISCARD_DEPRECATED This way future maintainers are informed the code should be deprecated, then removed at some point. How does this sound? Can you recommend a better alternative to EDIT: This is Python, not C 😄 so instead see #2058 (comment) |
I'm not sure what you want me to test. A directory and a file named |
4d051a1
to
bcef192
Compare
I have add a phony option "deprecated" set to |
I have also changed the documentation in the README. |
bcef192
to
41fbfe5
Compare
996ea7f
to
2e27007
Compare
I have restored these files to their previous state to avoid CI errors:
I will modify them in a subsequent PR if this one is merged. |
2e27007
to
aaa8bd2
Compare
@peternewman Lots of new tests added in |
aa4dd4c
to
951b1a1
Compare
951b1a1
to
5b0023f
Compare
@peternewman Do you need me to fix anything else? The CI error will vanish when #2069 is merged. |
5b0023f
to
a9fd70d
Compare
a9fd70d
to
26e740c
Compare
a4007aa
to
73ed4e1
Compare
@peternewman I think I have added enough tests. All of them pass after renaming The only remaining check failure appears to be with flake8, but that's a flake8 “feature” that I cannot fix in the source code. Please merge #2069 which fixes flake8 options. @larsoner Could you have a look at this PR? Edit: I have added even more tests with corner cases and found a discrepancy in the current handling of the old syntax (
I have added additional tests in #2227. They show the current limitations of |
f2299b1
to
17305bd
Compare
17305bd
to
12dae95
Compare
12dae95
to
95dbb4c
Compare
Shell completion often adds a separator '/' after directory names. Because os.walk() return paths without a trailing separator, we have to remove trailing separators from patterns to ensure a proper match. Granted, we could attempt to remove the trailing separator '/' only from "directory patterns", not from "file patterns", but patterns are currently just patterns - we do not distinguish between "file patterns" and "directory patterns". A future improvment could be to interpret a pattern with a trailing "/" as "directory pattern", but then how would we define a "file pattern"? It's probably better to find a different mechanism and keep discarding the trailing separator from patterns.
The --skip arguments are interpreted as paths relative to the directories to check (the "file" arguments passed to codespell), in addition to being interpreted "as is". The benefit is that "codespell ." and "codespell $PWD" behave the same, you just pass relative paths to --skip. No more adding "./" to directory patterns.
95dbb4c
to
a9337a5
Compare
codespell
asfile
arguments.Fixes #1915 and #2047.